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Abstract 

Recent work of [Dasgupta-Kumar-Sarlos, STOC 2010] gave a sparse Johnson-Lindcnstrauss 
transform and left as a main open question whether their construction could be efficiently 
derandomized. We answer their question affirmatively by giving an alternative proof of their 
result requiring only bounded independence hash functions. Furthermore, the sparsity bound 
obtained in our proof is improved. Our work implies the first implementation of a Johnson- 
Lindenstrauss transform in data streams with sublinear update time. 

1 Introduction 

The Johnson-Lindenstrauss lemma states the following. 

Lemma 1 (JL Lemma [2D]). For any integer d > 0, and any < e,6 < 1/2, there exists a 
probability distribution on k x d real matrices for k = log(l/(5)) such that for any x E M*^ 

with \\x\\2 = 1, 

PtaMAxWI - 1\ >e]<5. 

Several proofs of the JL lemma exist in the literature [H 13 \TT\ [T^ \TE[ [20l [23] , and it is known 
that the dependence on k is tight up to a constant factor [19] (see also Section [21 for another 
proof). Though, these proofs of the JL lemma give a distribution over dense matrices, where each 
column has at least a constant fraction of its entries being non-zero, and thus naively performing 
the matrix-vector multiplication is costly. Recently, Dasgupta, Kumar, and Sarlos [10] proved 
the JL lemma where each matrix in the support of their distribution only has a non-zero entries 
per column, for a = 0(e~^ log(l/(5) log^(/c/(5)). This reduces the time to perform dimensionality 
reduction from the naive 0{k ■ ||x||o) to 0{a ■ ||a;||o), where x has ||x||o non-zero entries. 

The construction of [lO] involved picking two random hash functions h : [da] — t- [k] and a : 
[da] — { — 1, 1} (we use [n] to denote {1, . . . ,?^}), and thus required Q{da ■ log/c) bits of seed to 
represent a random matrix from their JL distribution. They then left two main open questions: 

(1) derandomize their construction to require fewer random bits to select a random JL matrix, for 
applications in e.g. streaming settings where storing a long random seed is prohibited, and (2) 
understand the dependence on 6 that is required in a. 

We give an alternative proof of the main result of [10] that yields progress for both (1) and 

(2) above simultaneously. Specifically, our proof yields a value of a that is improved by a log{k/5) 
factor. Furthermore, our proof only requires that h be r/j-wise independent and a be ro--wise 
independent for rt = 0{log{k/5)) and rg- = 0(log(l/(5)), and thus a random sparse JL matrix can 
be represented using only 0{log{k/5) log (da + k)) = 0{log{k/6) log d) bits (note k can be assumed 
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less than d, else the JL lemma is trivial, in which case also log{da) = 0(log d)). We remark that |10j 
asked exactly this question: whether the random hash functions used in their construction could 
be replaced by functions from bounded independence hash families. The proof in [lOj required 
use of the FKG inequality [HI Theorem 6.2.1], and they suggested that one approach to a proof 
that bounded independence suffices might be to prove some form of this inequality under bounded 
independence. Our approach is completely different, and does not use the FKG inequality at all. 
Rather, the main ingredient in our proof is the Hanson- Wright inequality jl5j . a central moment 
bound for quadratic forms in terms of the Frobenius and operator norms of the associated matrix. 

We now give a formal statement of the main theorem of this work, which is a derandomized JL 
lemma where every matrix in the support of the distribution has good column sparsity. 

Theorem 2 (Main Theorem). For any integer d > 0, and any < e,6 < 1/2, there exists a family 
A of k X d real matrices for k = log(l/5)) such that for any x G M'^, 

PvA^AiWAxh i [(1 - e)||x||2, (1 + e)||x||2]] < b. 

and where A G A can be sampled using 0{log{k/6)logd) random bits. Every matrix in A has at 
most a = 0(e~"^ log(l/5) log(A;/(5)) non-zero entries per column, and thus Ax can be evaluated in 
0{a ■ \\x\\q) time if A is written explicitly in memory. If A £ A is not written explicitly in memory 
but rather we are given a string o/log(|^|) representing some matrix A £ A, then the multiplication 
Ax can be performed in 0{a ■ \\x\\o + t{a ■ \\x\\o, 0{log{k/6)),da, k) + t{a ■ \\x\\o, 0(log(l/(5)), da, 2) 
time. Here t{s, r, n, m) is the total time required to evaluate a random hash function drawn from 
an r-wise independent family mapping [n] into [m] on s inputs. 

We stated the time to multiply Ax above in terms of the t(-) function since one can evaluate 
an r-wise independent hash function on multiple points quickly via polynomial fast multipoint 
evaluation. Specifically, an r-wise independent hash family over a finite field can consist of degree- 
(r — 1) polynomials, and a degree-(r — 1) polynomial over a field can be evaluated on r — 1 points 
in only 0(r log^ r log log r) field operations as opposed to O(r^) operations [311 Ch. 10]. 

We also show a variant of our main result: that it is also possible to take a = e~(^+°'5(^)) log^(l/5) 
and set r^ = r^j = 0{\og{\/5)). Here 05(1) denotes a function that goes to as 5 — )• (specifically 
the function is 0(1/ log(l/(5)). This matches the best previously known seed length for JL of 
O (log (1/(5) log d) bits, and we still achieve good column sparsity. 

Implication for the streaming model. In the turnstile model of streaming |27| . a high- 
dimensional vector a; G M'^ receives several updates of the form "(z, f )" in a stream which causes 
the change Xj Xj + v, where (i, u) S {1, . . . , n} x {— M, . . . , M} for some positive integer M. 

Indyk first considered the problem of maintaining a low-dimensional ^2-embedding of x in the 
turnstile model in [17], where he suggested using a pseudorandom Gaussian matrix generated using 
Nisan's pseudorandom generator (PRG) [28] Clarkson and Woodruff later showed that the entries 
can be r-wise independent Bernoulli for r = 0(log(l/5)) [9]. Both of these results though give an 
algorithm whose update time (the time required to process a stream update) is ^{k). Using the 
matrix of [10] would give an update time of 0(a) (in addition to the time required to evaluate the 
0(log(A:/(5))-wise independent hash function), except that their construction requires superlinear 

^As noted in [17], the AMS sketch of [5] does not give an I2 embedding since the median operator is used to 
achieve low error probabihty. 
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(Q{dalogk)) space to store the hash function. As noted m [10], it is unclear how to use Nisan's 
PRG to usefully derandomize their construction since evaluating the PRG would require i}{k) 
timeU Our derandomization thus gives the first update time for £2 embedding in data streams 
which is subquadratic in 1/e. 



2 Related Work 

There have been two separate lines of related work: one line of work on constructing JL familie^ 
such that the dimensionality reduction can be performed quickly, and another line of work on 
derandomizing the JL lemma so that a random matrix from some JL family can be selected using 
few random bits. We discuss both here. 



2.1 Works on efficient JL embeddings 

Here and throughout, for a JL family A we use the term embedding time to refer to the running 
time required to perform a matrix-vector multiplication for an arbitrary A £ A. The first work to 
give a JL family with embedding time potentially better than 0{kd) was by Ailon and Chazelle 
[2]. The authors achieved embedding time 0{dlogd + A;log^(l/5)). Later, improvements were 
given by Ailon and Liberty in [3l Sj- The work of [3] achieves embedding time 0(dlogA:) when 
k = 0{d^^'^~"') for an arbitrarily small constant 7 > 0, and [4] achieves embedding time 0{dlogd) 
and no restriction on k, though the k in their JL family is 0{e~^ log(l/5) log"^ d) as opposed to the 
0(e~^ log (1/(5)) bound of the standard JL lemma. This dependence on 1/e was recently improved 
to quadratic by Krahmer and Ward [22], though the log^ d factor remains. The works of Hinrichs 
and Vybiral [16j and later Vybiral [32] considered taking a random partial circulant matrix as the 
embedding matrix. This gives embedding time 0{dlogd) via the Fast Fourier transform, and it 
was shown that one can take either k = 0(e~^ log^(l/(5)) [16] oi k = 0{e~^ log(l/(5) log{d/6)) [32] . 
Liberty, Ailon, and Singer [23] achieve embedding time 0{d) when k = 0{d^^'^~'^), but their JL 
family only applies for x satisfying ||x||oo < \\x\\2 ■ k~^^'^d~"' . 

None of the above works however can take advantage of the situation when x is sparse to achieve 
faster embedding time. The first work which could take advantage of sparse x was that of Dasgupta, 
Kumar, and Sarlos [10] who gave a JL family whose matrices all had 0{£~^ \og{l/6)log^{k/6)) 
non-zero entries per column. They also showed that for a large class of constructions, sparsity 
min{e~^, Y^log,fc(l/(5)} is necessary when S = o{l)/d^. 

Other related works include [8] and [30]. Implicitly in [8], and later more explicitly in [30], a 
JL family was given with column sparsity 1 using only constant-wise independent hash functions. 
The construction was in fact the same as in [10], but with h being pairwise independent, and a 
being 4- wise independent. This construction only gives a JL family for constant 6 though, since 
with such mild independence assumptions on h, a one needs k to be polynomially large in 1 /5. 

^The evaluation time is at least linear in the seed length, which is at least the space usage of the machine being 
fooled (r2(fc) space in this case). 

^In many known proofs of the JL lemma, the distribution over matrices in Lemma [T] is obtained by picking a 
matrix uniformly at random from some set A. In such a case, we call A s. JL family. 
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2.2 Works on derandomizing the JL lemma 

The ^2-streaming algorithm of Alon, Matias, and Szegedy [5] imphes a JL family with seed length 
0(logd) and with k = 0{l/{e^6)). Karnin, Rabani, and Shpilka [21] recently gave a family with 
seed length (1 + o(l)) log2 d + 0{\og^{l/e)) also with k = poly(l/(e(5)). The best known seed length 
for a JL family we are aware of is due to Clarkson and Woodruff [9]. Theorem 2.2 of [9] implies that 
a scaled random Bernoulli matrix with r2(log(l/5))-wise independent entries satisfies the JL lemma, 
giving seed length 0(log(l/(^)-log d). In Section[Bl we show how to bootstrap the r-wise independent 
JL family construction to achieve seed length 0(log d + log(l/e) log(l/(5) + log(l/(5) log log(l/5)). 
We note that a construction which achieves this seed length for 6 < d~^^^'^ was recently achieved 
independently by Meka [25] . 

Derandomizing the JL lemma is also connected to pseudorandom generators (PRGs) against 
degree-2 polynomial threshold functions (PTFs) over the hypercube [121 126j . A degree-t PTF is 
a function / : {—1,1}"' — t- {—1,1} which can be represented as the sign of a degree-t d-variate 
polynomial. A PRG that 5-fools degree-t PTFs is a function F : {-1, ly {-1, 1}'' such that for 
any degree-t PTF /, 

|E.ee,.[/(F(z))]-E,g^,4/(x)]| <5, 

where is the uniform distribution on { — 1, 1}™. 

Note that the conclusion of the JL lemma can be rewritten as 

EA[I[l^s,l+e]{\\Ax\\l)] > 1-5, 

where I[a.b] is the indicator function of the interval [a, b], and furthermore A can be taken to have 
random ztl/i/fc entries [1]. Noting that (z) = (sign(z — a) — sign(2; — b))/2 and using linearity 
of expectation, we see that any PRG which (5-fools sign(p(x)) for degree-t polynomials p must also 
(5-fool I[a,b]{p{^))- Now, for fixed x, \\Ax\\2 is a degree-2 polynomial over the boolean hypercube 
in the variables Aij and thus a PRG which (5-fools degree-2 PTFs also gives a JL family with the 
same seed length. Each of [12\ I26j thus give JL families with seed length poly(l/(5) • logd. Also, 
it can be shown via the probabilistic method that there exist PRGs for degree-2 PTFs with seed 
length O (log (1/(5) + logd) (see Section B of the full version of [26] for a proof), and it remains an 
interesting open problem to achieve this seed length with an explicit construction. It is also not 
too hard to show that any JL family T must have seed length Q{log{l/6) + log(d/A:))0 

Other derandomizations of the JL lemma include the works [13] and [29] . A common application 
of the JL lemma is the case where there are n vectors xi,. . . ,Xn £ and one wants to find a 
matrix A S M'^^'^ to preserve \\xi — Xj\\2 to within relative error e for all In this case, one can 
set 6 = 1/n^ and apply Lemma [H then perform a union bound over all i,j pairs. The works of 
[131 [29] do not give JL families, but rather give deterministic algorithms for finding such a matrix 
A in the case that the vectors known up front. 

3 Conventions and Notation 

Definition 3. For A £ M"^", we define the Frobenius norm of A as \\A\\f = \jYl,i,j ^1 j- 

*We need > 1/5 to have error probability 5. Also, if < d/fc, then the matrix obtained by concatenating all 
rows of matrices in F has a non-trivial kernel, implying a vector exists in the intersection of all their kernels. 
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Definition 4. For A G M"^", we define the operator norm of A as 

\\A\\2 = sup \\Ax\\2. 

\\x\\2 = l 

In the case A has all real eigenvalues (e.g. it is symmetric), we also have that \\A\\2 is the largest 
magnitude of an eigenvalue of A. 

Throughout this paper, e is the quantity given in Lemma [TJ and is assumed to be smaller than 
some absolute constant eq > 0. All logarithms are base-2 unless explicitly stated otherwise. Also, 
for a positive integer n we use [n] to denote the set {l,...,n}. All vectors are assumed to be 
column vectors, and v"^ for a vector v denotes its transpose. Finally, we often implicitly assume 
that various quantities are powers of 2 (such as e.g. 1/S), which is without loss of generality. 



4 Warmup: A simple proof of the JL lemma 

Before proving our main theorem, as a warmup we demonstrate how a simpler version of our 
approach reproves Achlioptas' result [I] that the family of all (appropriately scaled) sign matrices 
is a JL family. Furthermore, as was already demonstrated in [9l Theorem 2.2], we show that 
rather than choosing a uniformly random sign matrix, the entries need only be r2(log(l/(^))-wise 
independent. 

We first state the Hanson- Wright inequality [15], which gives a central moment bound for 
quadratic forms in terms of both the Frobenius and operator norms of the associated matrix 0. 

Lemma 5 (Hanson- Wright inequality |15j). Let z = {zi, . . . , Zn) he a vector of i.i.d. Bernoulli ±1 
random variables. Then for any symmetric B G M"^" and integer £ >2 a power of 2, 



E 



z'^Bz-tiace{B)y < 64^ • max | \/£ • \\B\\f,£- H^lb}^ 



Theorem 6. For d > an integer and any < e,6 < 1/2, let A be a k x d random matrix with 
ibl/\/fc entries that are r-wise independent for k = 0(e~^ log(l/5)) and r = r2(log(l/5)). Then for 
any x G M'^ with \\x\\2 = 1, 

PrA[\\\Ax\\l-l\ >e]<6. 



Proof. We have 

1 ' ( \ 

ll^^lla = ■ X] X] Xsxtai^scri,t I , (1) 

i=l \{s,i)G[rf]x[d] / 

where o" is a fcd-dimensional vector formed by concatenating the rows of Vk ■ A. Define the matrix 
T G ^kdxkd ^Yie block-diagonal matrix where each block equals xx'^ /k. Then, ||^x||2 = a^'^Ta. 
Furthermore, trace(T) = ||x||2 = 1. Thus, we would like to argue that a^Ta is concentrated about 
trace(r), for which we can use Lemma [5l Specifically, if ^ > 2 is even, 

Pr[| \\Ax\\l - 1| > e] = Pr[\a^Ta - trace(r)| > e] < • E[{a^Ta - trace(r))^] 



^ [15| proves a tail bound, but it is not hard to then derive a moment bound via integration; see [12) for a direct 
proof of the moment bound 
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by Markov's inequality. To apply Lemma O we also pick H. a power of 2, and we ensure 2£ < r so 
that the £th moment of a^Ta — trace(r) is determined by r-wise independence of the a entries. 
We also must bound ||T||ir and ||T||2. Direct computation gives ||r|||, = (l/k) ■ \\x\\2 = 1/k. Also, 
X is the only eigenvector of xx'^ /k with non-zero eigenvalue, and furthermore its eigenvalue is 
ll^lli/^ = 1/k, and thus ||T||2 = 1/k. Therefore, 



Remark 7. The conclusion of Lemma [5] holds even if the Zi are not necessarily Bernoulli but 
rather have mean 0, variance 1, and sub-Gaussian tails, albeit with the "64" possibly replaced by 
a different constant (see [l5]). Thus, the above proof of Theorem [6] carries over unchanged to show 
that A could instead have r2(log(l/5))-wise independent such Zi as entries. We also direct the reader 
to an older proof of this fact by Matousek [23], without discussion of independence requirements 
(though independence requirements can most likely be calculated from his proof, by converting the 
tail bounds he uses into moment bounds via integration). 

5 Proof of Main Theorem 

We recall the sparse JL transform construction of [10] (though the settings of some of our constants 
differ). Let k = 28 ■ 64^ • log(l/(5). Pick random hash functions /i : [d] — > [k] and cr : [d] — 
{ — 1,1}. Let 5ij be the indicator random variable for the event h{j) = i. Define the matrix 
A G {—1,0,1}'^^'^ by Aij = 6ij ■ a{j). The work of [10] showed that as long as x € M*^ satisfies 
||x||2 = 1 and has bounded ||x||oo; then Pr/i^o-[| — 1| > e] < 0{6). We show the same 
conclusion without the assumption that h, a are perfectly random; in particular, we show that h 
need only be r/j-wise independent and a need only be ro--wise independent for rh = 0{log{k/6)) 
and To- = 0(log(l/5)). Furthermore, our assumption on the bound for || 

x||oo is ||x||oo — c for 



c = e(^e/(log(l/5) •log(A;/(5))), whereas [TO] required c = G( Je/(log(l/5) ■ \og^ {k / 6))) . This is 



relevant since the column sparsity obtained in the final JL transform construction of [TH] is 1/c^. 
This is because, to apply the dimensionality reduction of [10] to an arbitrary x of unit I2 norm 
(which might have ||x||oo ^ c), one should first map a; to a vector x by a (d/c^) x d matrix Q 
with Qi-^r+i2,h+i = c and other entries for ii G {0, . . . ,d — 1}, ^2 £ Then ||x||2 = 1 and 

||i||oo < c, and thus the set of products with Q of JL matrices in the distribution of [10] over 
dimension d/c^ serves as a JL family for arbitrary unit vectors. Thus, the sparsity obtained by our 
proof in the final JL construction is improved by a G(log(fc/5)) factor. 
Before proving our main theorem, first we note that 



Though our constant factor for k is quite large, most likely the 64 could be made much smaller by tightening the 
analysis of constants in [12] , 




which is at most J for ^ = log(l/(5) and A; > 4 • 64^ • log(l/5)0 
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We would like that ||^x||2 is concentrated about 1, or rather, that 

\ 

Z = 2^ ^^5s,j5t,jXsXt cr{s)a{t) (3) 

s<t \j=l J 

is concentrated about 0. Let rjs^t be the indicator random variable for the event s ^ t and /i(s) = 
h{t). Then for fixed h, Z is a quadratic form in the a{i) which can be written as a^Ta for a d x d 
matrix T with T^^t = XsXtr]s,t (we here and henceforth slightly abuse notation by sometimes using 
a to also denote the d-dimensional vector whose ith entry is 

Our main theorem follows by applying Lemma [5] to a'^Ta, as in the proof of Theorem [6] in 
Section to show that Z is concentrated about trace(r) = 0. However, unlike in Section [H our 
matrix T is not a fixed matrix, but rather is random; it depends on the random choice of h. We 
handle this issue by using the two lemmas below, which state that both \\T\\f and ||T||2 are small 
with high probability over the random choice of h. We then obtain our main theorem by first 
conditioning on this high probability event before applying Lemma [5j The lemmas are proven in 
Section [6] and Section [71 

Henceforth in this paper, we assume ||x||2 = 1, ||2;||oo < c, and T is the matrix described above. 

Lemma 8. Pr/,[||T|||, > 7/k] < 5. 

Lemma 9. Pr/,[||r||2 > e/(128 • log(l/5))] < 5. 

The following theorem now implies our main theorem (Theorem [2]) . 
Theorem 10. 

PvhAM^Wl-M >e]<36. 

Proof. Write 

ll^^lli = Iklll + 2 ^ XsXtr]s^tf^{s)a{t) 
= 1 + Z. 

We will show Pr/i^o-[|-2^| > e] < 35. Condition on h, and let £ be the event that ||r|||, < 7/k 
and ||T||2 < e/log(l/(5). By applications of Lemma [8] and Lemma[9]and a union bound, 

Prh^„[\Z\ >e]< Pr^[\Z\ >e\£] + 26. 

By a Markov bound applied to the random variable Z^ for £ an even integer, 

Pr^[|Z| > e\£]<B^[Z^ \ £]/£^. 

Since Z = a^Ta and trace(T) = 0, applying Lemma [5] with B = T and 2£ < gives 

-"Vt- 128.1og(l/^) | ■ ("> 

since the ith. moment is determined by rg-wise independence of a. We conclude the proof by noting 
that the expression in Eq. (jH) is at most 6 for i = log (1/5). ■ 
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Remark 11. In the proof Theorem IIOI rather than condition on £ we can directly bound the 
0(log(l/(5))th moment of Z over the randomness of both h and a simultaneously. In this case, we 
use the Frobenius and operator norm moments from Eq. ([8]) and Eq. (jlOp directly. This gives 



as long as h, a are £-wise independent. One can then set £ = 0(log(l/(5)) and c = 0{{y/e/ \og{\/5))- 
^2/iog(i/5)-) ^ 0(V?+°W/log(l/(5)) to make the above probability at most 6. 




6 A high probabiUty bound on ||T| 



F 



In this section we prove Lemma [8l 



Proof (of Lemma [S]). Recall that for s,i G [d], r^s^t is the random variable indicating that s ^ t 
and h{s) = h{t). Then, Eq. ^ implies that ||r|||;, = 2 ^^^j x^x^ jy^^t. Note \\T\\\ is a random 
variable depending only on h. The plan of our proof is to directly bound the iih. moment of ||r||^ 
for some large £ (specifically, £ = 0(log(l/5))), then conclude by applying Markov's inequality to 
the random variable HTHl?. We bound the ^th moment of ||T|||^ via some combinatorics. 

\TfpY. We have 



We now give the details of our proof. Consider the expansion 



\T\\l) 



E 

Sl,...,Sf 

tl,...,tl 



n 



2 2 



(5) 



Let be the set of all isomorphism classes of graphs (possibly containing multi-edges) with between 
2 and 21 unlabeled vertices, minimum degree at least 1, and exactly £ edges with distinct labels in 

[P\. We now define a map / : {('2) } ^ Qi where the notation (^) denotes subsets of U of size 
r; i.e. / maps the monomials in Eq. ^ to elements of Q^. Focus on one monomial in Eq. (0) and 
let S = {si, . . . , ti, . . . ,t(}. We map the monomial to an jS'l-vertex element of as follows: 
associate each u £ S with a vertex, and for each Si,ti, draw an edge from the vertices associated 
with Si,ti using edge label i. 

We now analyze the expectation of the summation in Eq. ([5]) by grouping monomials which 
map to the same elements of Qi under /. 



\T\\l) 



''I 



[d\\ 



{(s„i,)}G( 
f{{{s,,U)})=G 



11". 

.4 = 1 



(6) 



Observe that Y\i=i''lsi,ti is determined by h{si),h{ti) for each i € [£], and hence its expectation 
is determined by 2.^-wise independence of h. Note that this product is 1 if Sj and ti hash to the 
same element for each i and is otherwise. Each pair hash to the same element if and only 
if for each connected component of G, all elements oi S = {si, . . . , s^, ti, . . . , t^} corresponding to 
vertices in that component hash to the same value. For the vq elements we are concerned with, 
where vg = \S\ is the number of vertices in G, we can choose one element of [k] for each connected 
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component. Hence the number of possible values of h on S that cause Jli=i to be 1 is k"^^ , 
where G has mc connected components. Each possibility happens with probability k~^^ . Hence 

Also, consider the term 111=1 ^si^t^ — Hi^fi ^r/S where S = {riYi=ii each ^i is at least 1, and 
^j^i = 21 {ii is just the degree of the vertex associated with in G). Then, 

n -n^' = n -n^"-^^ • n u n -'f-^^ • n 4 < ^^^^^--^ 

i=l \i=l / \i=l / \i=l / \i=l / \i=l / 

Note then that the monomials (11^=1 -'"n) that arise from the summation over {(si,ij)} G (2) with 
/({(sj, tj)}) = G in Eq. ([Gj) are a subset of those monomials which appear in the expansion of 
(.Yli=i ^lY'^ — 1- Thus, plugging back into Eq. ([6]), 



\T\\IY 



<2'.y2. . (7) 



Note the value i in the c^^'^^~'"g) term just arose as e^, the number of edges in G. We bound 
the above summation by considering all ways to form an element of Qi by adding one edge at a 
time, starting from the empty graph Gq with zero vertices and edges. In fact we will overcount 
some G G Qi, but this is acceptable since we only want an upper bound on Eq. ([7|). 

Define F{G) = c^i'^^a-vc) /k^a-ma . Initially we have F{Gq) = 1. We wiU add ^ edges in order by 
label, from label 1 to L For the iih. edge we have three options to form Gi from Gj_i: (a) we can add 
the edge between two existing vertices in Gj_i, (b) we can add two new vertices to Gj_i and place 
the edge between them, or (c) we can create one new vertex and connect it to an already-existing 
vertex of Gj_i. For each of these three options, we will argue that ni - F{Gi) / F{Gi-i) < 1/k, where 
Hi is the number of ways to perform the operation we chose at step i. This implies that the right 
hand side of Eq. ([7]) is at most {6/kY since at each step of forming an element of G£ we have three 
options for how to form Gi from Gj_i. 

Let e be the number of edges, v the number of vertices, and m the number of connected 
components for some Gi_i. In option (a), v remains constant, e increases by 1, and m either 
remains constant or decreases by 1. In any case, F{Gi)/ F{Gi^i) < c^, and Jij < 2£^; the latter 
is because we have (2) < choices of vertices to connect. In option (b), rii = 1, v increases 
by 2, e increases by 1, and m increases by 1, implying rii ■ F{Gi)/F{Gi-i) = 1/k. Finally, in 
option (c), rii = V < 2£, V increases by 1, e increases by 1, and m remains constant, implying 
Hi • F{Gi) / F{Gi-i) < 2ic^/k. Thus, regardless of which of the three options we choose, rij • 
F{Gi)/F{Gi-i) < max{2£2c4, 1/k, 2(0^ /k}, which is 1/k for £ = 0{log{l/6)). 

As discussed above, when combined with Eq. ([7]) this gives 

E.[(||T|||)^] < {6/kY. (8) 
Then, by Markov's inequality on the random variable (IIT^II^)^ for £ > 2 and even, and assuming 

2i < Th, 

Pr,[||r||^ > 7/k] < {k/7Y . E,[(||r||2,)^] < (6/7)^ 
which is at most 5 for ^ = 0(log(l/(5)). ■ 
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7 A high probabihty bound on ||T| 



In this section we prove Lemma El For each j G [k] we use aj to denote jg[^] 



h{i)=j 



Lemma 12. ||r||2 < max{c^, maxjg[fc] Oj}. 



Proof. Define the diagonal matrix R with = xf, and put S = T + R. For each j G [/c], 
consider the vector Vj whose support is h~^{j), with (vj)i = Xi for each i in its support. Then 

s = E 



j=l -3 



Thus rank(S') is equal to the number of non-zero vj, since they are clearly 
linearly independent (they have disjoint support and are thus orthogonal) and span the image of S. 
Furthermore, these non-zero Vj are eigenvectors of S since Svj = ajvj, and are the only eigenvectors 
of 5 with non-zero eigenvalue since if u is perpendicular to all such Vj then Au = 0. 



Now, 



supii^ll 



semidefinite, we then have 
above that ||S'||2 = maxj-gj^j a 



\x- 

3- 



Tx 



SUP||x||2 = 

< max{||S'||2, 



I \x'^ Sx — x'^ Rx\. Since S,R are both positive 
l-^lb}- ll-^lb is clearly \\x 



< cr , and we saw 



Proof (of Lemma [9]) . Fix some j G [k] . Define Xi 



xfSij so that Oj 



Yli=iXi- Then 



l<si,...,s^<a! 



(9) 



Let Vi be the set of length-^ vectors v with non-negative integer entries such that if r > appears 
as an entry of v, then at least one appearance of r — 1 is in u at an earlier index. Define the 
map / : [d]^ — ?• as follows: a vector w G [d]^ maps to the vector where for each i G [i], if Wi is 
the rth distinct value (0-based indexing) to appear in w then we replace Wi with r. For example, 
/((14, 1, 4, 14)) = (0, 1, 2, 0). We group the monomials in Eq. Q by equal images under /. That is. 



= E E E» 



v&Vi l<si,...,se<d Li=l 
f{{si,...,si))=v 



E E -E. 



vGVi l<si,...,si<d \i=l 
f{{si,...,st))=v 



i=l 



E E n-iU--sE E 



v(^Vi l<si,...,si<d \i=l 
f{(si,...,si))=v 



vGVi l<si,...,si<d 
f({si,...,si))=v 



where the penultimate equality holds if £ < r^, and is the number of distinct values amongst 
the entries of v. The final equality holds since, pulling out a term for each multiple occurrence 
of any s,, for a fixed v these terms all show up in the expansion of (Ha^Hi)"^" ■ 

Now we bound the double summation above. Begin with the empty sequence vq = () (in Vq). 
We will arrive at some Vi G Ve hy appending an entry one at a time. In transitioning from Vi-i to 
Vi we can either (i) repeat an entry that already appeared in Vi^i, or (ii) add a new entry (whose 
identity is unique: it must be the next largest integer which has not appeared in Vi-i). For (i) there 
are m^.^-^ < i ways to choose a pre-existing integer to repeat, i increases by 1, and niy. = m„-_-^, 
and thus we gain a factor of c^£. For (ii), there is one way to choose a new integer to appear, i 
increases by 1, and rriy- = m^.^^ + 1, and thus we gain a factor of 1/k. Since at each step we have 
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two options to choose from (either perform (i) or (ii)), E/j[aj] < 2^ • max{l//c, c^^}^. We then have 



E;,[||r||*] < 









maxa,- 




e 

maxQ,- 







k 



<Y,'^h[aj]<k-2^ ■max{l/k,c'^iY (10) 



The lemma follows by a Markov bound with i = 0{log{k/6)), i.e. Pr/j[||r||2 > A] < A"^ • E,j[||r||^] 
andweset A = e/(128-log(l/(5)). ■ 

Remark 13. One could integrate the Bernstein inequality tail bound to obtain a moment bound 
which applies to E/j[aj] in the proof of Lemma [H The conclusion would not improve. We chose to 
give an elementary proof to be self-contained. 
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Appendix 

A Optimality of Lemma [1] 

Jayram and Woodruff gave a proof that the k = log(l/5)) in Lemma[T]is optimal [19]. Their 

proof went through communication and information complexity. We here give another proof of this 
fact, via some linear algebra and direct calculations. 

Note that for a distribution V, if Pryi^x'[| ||^2;||2 — 1|] < S] for any x G S'^~^, then it must be 
the case that Pr^^x'[Pra;e5''-i [l M^lli ~ M]] < ^- The following theorem shows that no A e M'^^'' 
can have Pr^g^d-i [| || AxU^ — 1|] < (5 unless k is at least as large as in the statement of Lemma[TJ 

Theorem 14. If A ^M.'^ is a linear transformation with d > 2k and e > sufficiently small, 
then for x a randomly chosen vector in S'^~^, Pr[|||^x||2 — 1| > e] > exp(— 0(A;e^ + 1)). 

Proof. First we note that we can assume that A is surjective since if it is not, we may replace 
R'' by the image of A. Let V = kev{A) and let W = V-^. Then dim(VF) = k, dim{V) = d - k. 
Now, any x £ M*^ can be written uniquely as xy + xw where xy and x\y are the components 
in V and W respectively. We may then write xy = ry^v, x\y = rw^w, where ry,riy are 
positive real numbers and ^ly and ^Iw are unit vectors in V and W respectively. Let sy = ry 
and sw = fyy. We may now parameterize the unit sphere by [sy, ^ly, sw,^w) G [0, 1] x S'^~^~^ x 
[0, 1] X S^~^, so that sy + sw = 1- It is clear that the uniform measure on the sphere is given 
in these coordinates by f {sw)dswdi^ydClw for some function /. To compute / we note that 
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f{sw) should be proportional to the limit as 61,82 — )■ 0"*" of {5162)"^ times the volume of points x 
so that G [1,1 + 61] and Ija^vi/lll ^ [■svKi'Svi/ + 62]- Equivalently, G [siy,svy + 62], and 

||xy||| G [1 — II2, 1 — lla^H^lli+'^i]- For fixed xvi/; the latter volume is within 0(5i of the volume 
of XV so that G [sy, sy + ^i]- Now the measure on V is r'^^~^drvdQv Therefore it also is 

^Sy ^ "^^^"^ dsydQ.v ■ Therefore this volume over V is proportional to Sy ^ "^^^"^{81 +0{5i52 + 5\)). 
Similarly the volume of xw so that ||xvi/||2 ^ [sw-, sw + '^2] is proportional to ^^^^((^2 + ^((Jl)). 
Hence / is proportional to Sy ^ '^)/'^ _ 

We are now prepared to prove the theorem. The basic idea is to first condition on r2y,J7^y. 
We let C = \\AQ.v/\\%- Then if x is parameterized by {sy ,Qy , sw ,^w)i 11^^ II 2 — Csw- Choosing 
X randomly, we know that s = sw satisfies the distribution ^((fc_2)/2,(d-fc-2)/2) ~ f\s)ds on 
[0, 1]. We need to show that for any c = ^, the probability that s is not in [(1 — e)c, (1 + e)c] 
is exp(— 0(e^A;)). Note that /(s) attains its maximum value at sq = < \- Notice that 
log(/(so(l + x))) is some constant plus ^ log(so(l + + log(l - sq- xsq). If ||x||2 < 1/2, 

then this is some constant plus —0{kx'^). So for such x, /(so(l + x)) = /(sq) exp(— 0(A;x^)). 
Furthermore, for all x, /(so(l + a^)) = /(so) exp(— r2(A;x^)). This says that / is bounded above by 
a normal distribution and checking the normalization we find that /(sq) = r2(sg ^fc^^^). 

We now show that both Pr(s < (1 — e)so) and Pr(s > (1 + e)so) are reasonably large. We can 
lower bound either as 



So 



1/2 

/(so)exp(-0(A:x2))dx > / exp(-0(A:x2))dx 



> exp(-0(A;e2 + l)). 

Hence since one of these intervals is disjoint from [(1 — e)c, (1 + e)c], the probability that s is not 
in [(1 — e)c, (1 + e)c] is at least exp(— 0(A;e^ + 1)). ■ 



B A JL Lemma derandomization 

We give an explicit JL family with seed length 0{\ogd + log(l/e) log (1/5) + log (1/5) log log (1/5)). 
This seed length is always at least at least as good as the O (log (1/5) log d) seed length coming from 
A;-wise independence, but can be much better for some settings of parameters. 

The idea is simply to graduately reduce the dimension. Consider values e', 5' > which we will 
pick later. Define tj = b'~^l'^\ We embed M'^ into M'^^ for k\ = £'~Hi. We then embed Ml'i-^ into 
M'^. for kj = e'~Hj until the point j = f = 0(log(l/5')/loglog(l/5')) where tj* = 0(log^(l/5')). 
We then embed M'^j* into MJ^ for k = 0{e'-'^ log(l/5')). 

The embeddings into each kj are performed by picking a Bernoulli matrix with rj-wise indepen- 
dent entries, as in Theorem[6l To achieve error probability 5' of having distortion larger than (1+e') 
in the jth step, Eq. ([2]) in the proof of Theorem [6] tells us we need rj = 0(log(l/5')/(tj/log(l/5'))). 
Thus, in the first embedding into M'^^ we need O(logd) random bits. Then in the future embed- 
dings except the last, we need 0{2^ • (log(l/e') + 2"-' log(l/5'))) random bits to embed into M'^^ . 
In the final embedding we require 0(log(l/5') • (log(l/e') + loglog(l/5')) random bits. Thus, in 
total, we have used 0{\ogd + log(l/e') log(l/5') + log(l/5') log log(l/5')) bits of seed to achieve 
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error probability 0(6' ■ j*) of distortion (1 + e'y* . The following theorem thus follows by applying 
this argument with error probability 5' = 5/{j* + 1) and distortion parameter e' = 0(e/j*). 

Theorem 15. For any < £,6 < 1/2 there exists an explicit JL family with seed length s = 
0(log (i+log(l/e) log(l/(5)+log(l/(5) log log(l/5)). Given a seed and a vector x G M'^, the embedding 
can be performed in polynomial time. 

Remark 16. One may worry that along the way we are embedding into potentially very large 
dimension (e.g. 1/6 may be so that our overall running time could be exponentially large. 

However, we can simply start the above iterative embedding at the level j where £~'^tj < d. 
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